Materials+ML Workshop Day 4¶

logo

Day 4 Agenda:¶

  • Questions about Day 3 Material
  • Review of Day 3

Content for today:

  • Data Manipulation:
    • The Pandas Package
    • Working with DataFrames
  • Visualizing Data
    • The Matplotlib package
    • Visualizing 1D data
    • Visualizing 2D and 3D data

Tentative Week 1 Schedule:¶

Session Date Content
Day 1 06/09/2025 (2:00-4:00 PM) Introduction, Python Data Types
Day 2 06/10/2025 (2:00-4:00 PM) Python Functions and Classes
Day 3 06/11/2025 (2:00-4:00 PM) Scientific Computing with Numpy and Scipy
Day 4 06/12/2025 (2:00-4:00 PM) Data Manipulation and Visualization
Day 5 06/13/2025 (2:00-4:00 PM) Materials Science Packages, Introduction to ML

Questions¶

Material covered yesterday:

  • Installing Python packages
  • Numpy
  • Scipy

Review: Day 3¶

Numpy Package¶

  • Numpy supplies mathematical functions (such as sin(x), exp(x), etc.)
  • Numpy arrays (numpy.ndarray) are multi-dimensional data structures
  • These arrays can represent vectors, matrices, tensors, etc.
  • Creating Numpy arrays:
In [1]:
import numpy as np

# create a 1D array:
x = np.array([1.0, 2.0, 3.0, 4.0])
print(x)

# create a 2D array (matrix):
X = np.array([
    [1,2,3],
    [4,5,6],
    [7,8,9]
])
print(X)
[1. 2. 3. 4.]
[[1 2 3]
 [4 5 6]
 [7 8 9]]
  • Every array has an instance variable shape
  • The length of the tuple is the dimension of the array
  • The entries in the tuple represent the size of the array along each axis (i.e. dimension)
In [2]:
# x is a 1D array of length 4:
print(x.shape)

# X is a 3x3 matrix:
print(X.shape)

# create an array of zeros with a 3x2x2 shape:
S = np.zeros((3,2,2))
print(S.shape)
(4,)
(3, 3)
(3, 2, 2)
  • Numpy arrays can be indexed like Python lists, but with some added features:
In [3]:
X = np.array(range(1,10)).reshape((3,3))
print(X)

# access row 0:
print('Accessing X[0]:')
print(X[0])

# access row 0, column 2:
print('Accessing X[0,2]:')
print(X[0,2])

# access column 0:
print('Accessing X[:,0]:')
print(X[:,0])
[[1 2 3]
 [4 5 6]
 [7 8 9]]
Accessing X[0]:
[1 2 3]
Accessing X[0,2]:
3
Accessing X[:,0]:
[1 4 7]
  • All math operations on arrays are performed elementwise
  • Numpy support matrix multiplications with the @ operator
In [4]:
A = np.array(range(1,5)).reshape(2,2)
D = np.diag([1,2])

print('A:\n', A)
print('D:\n', D)

# elementwise addition:
print(A + D)

# matrix multiplication:
print(A @ D)
A:
 [[1 2]
 [3 4]]
D:
 [[1 0]
 [0 2]]
[[2 2]
 [3 6]]
[[1 4]
 [3 8]]
  • One important numpy function we will use a lot today is np.linspace:
In [5]:
start = 0.0
end = 10.0
n_pts = 11

# create a 1D array of uniform points:
x_pts = np.linspace(start, end, n_pts)
print(x_pts)
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]

Scipy Package¶

  • Scipy provides many useful subpackages for scientific computing
  • Subpackages you may find useful include:
    • scipy.constants: physical constants, unit conversions
    • scipy.optimize: functions for optimization and root finding
    • scipy.integrate: functions numerical integration
    • scipy.stats: statistical analysis functions
    • scipy.special: special functions (e.g. Bessel functions)

New Content:¶

  • More Python packages:
    • Pandas ("Panel Datasets")
    • Matplotlib ("MATLAB-like plotting library")

Checking if Packages are installed¶

  • The quickest way to check if a package is installed on your system is to import it:
In [6]:
import matplotlib
In [7]:
import pandas

Installing Pandas:¶

In [8]:
!pip install pandas
Requirement already satisfied: pandas in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (2.2.3)
Requirement already satisfied: tzdata>=2022.7 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from pandas) (2023.3)
Requirement already satisfied: pytz>=2020.1 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from pandas) (2023.3)
Requirement already satisfied: python-dateutil>=2.8.2 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from pandas) (2.8.2)
Requirement already satisfied: numpy>=1.22.4 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from pandas) (2.2.6)
Requirement already satisfied: six>=1.5 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)

Installing Matplotlib:¶

In [9]:
!pip install matplotlib
Requirement already satisfied: matplotlib in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (3.10.3)
Requirement already satisfied: packaging>=20.0 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from matplotlib) (23.1)
Requirement already satisfied: pillow>=8 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from matplotlib) (9.5.0)
Requirement already satisfied: numpy>=1.23 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from matplotlib) (2.2.6)
Requirement already satisfied: pyparsing>=2.3.1 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from matplotlib) (2.4.7)
Requirement already satisfied: fonttools>=4.22.0 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from matplotlib) (4.39.4)
Requirement already satisfied: cycler>=0.10 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from matplotlib) (0.11.0)
Requirement already satisfied: python-dateutil>=2.7 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from matplotlib) (2.8.2)
Requirement already satisfied: contourpy>=1.0.1 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from matplotlib) (1.0.7)
Requirement already satisfied: kiwisolver>=1.3.1 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from matplotlib) (1.4.4)
Requirement already satisfied: six>=1.5 in /media/colin/Shared/colin/git/materials-ml-workshop/env/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)

Pandas¶

  • Pandas is an open-source Python package for data manipulation and analysis.
  • It can be used for reading writing data to several different formats including:
    • CSV (comma-separated values)
    • Excel spreadsheets
    • SQL databases
  • We can import pandas as follows:
In [10]:
import pandas as pd

DataFrames¶

  • Pandas introduces the DataFrame type for manipulating data
  • We can create DataFrames from Python dictionaries as follows:
In [13]:
# Data on the first four elements of the periodic table:
elements_data = {
    'Element' : ['H', 'He', 'Li', 'Be'],
    'Atomic Number' : [ 1, 2, 3, 4 ],
    'Mass' : [ 1.008, 4.002, 6.940, 9.012],
    'Electronegativity' : [ 2.20, 0.0, 0.98, 1.57 ]
}

# construct dataframe from data dictionary:
df = pd.DataFrame(elements_data)

Tutorial: Working with Pandas DataFrames¶

  • Accessing Dataframe columns
  • Filtering Dataframes
  • Transforming Data
  • Importing and exporting data

Exercise: Working with Pandas DataFrames¶

  • Exploring the Periodic Table
    • Download the Periodic Table CSV file.
    • Answer the following questions:
      • What fraction of elements in the Periodic Table were discovered before 1900?
      • Which elements have at least 100 isotopes?
      • What is the average atomic mass of the radioactive elements?

Matplotlib¶

  • Matplotlib is a MATLAB-like plotting utility for creating publication-quality plots
  • In matplotlib, we typically import the pyplot subpackage with the alias plt:
In [12]:
import matplotlib.pyplot as plt

Tutorial: Plotting with Matplotlib¶

  • Plotting 1D data
  • Styling plots
  • Adding axes labels, titles, legends
  • Typesetting
  • Plotting in 3D
  • Saving figures

Exercises: Plotting with Matplotlib¶

  • Histograms
  • Chaotic Dynamical Systems

Recommended Reading:¶

  • Materials Science Python Packages

Bring your questions to our next meeting tomorrow!